NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Investigating grade-level and text genre effects in Quality Talk discussions: An AI-powered discourse analysis of upper primary students’ high-level comprehension

https://doi.org/10.1016/j.learninstruc.2025.102208

Firetto, Carla M; Murphy, P Karen; Starrett, Emily; Herman, Emilee A; Greene, Jeffrey A; Tang, Yue; Yan, Lin (December 2025, Learning and Instruction)

Full Text Available
MTrain: Enable Efficient CNN Training on Heterogeneous FPGA-Based Edge Servers

https://doi.org/10.1109/TCAD.2025.3541486

Tang, Yue; Jones, Alex K; Xiong, Jinjun; Zhou, Peipei; Hu, Jingtong (January 2025, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems)

FPGA-based edge servers are used in many applications in smart cities, hospitals, retail, etc. Equipped with heterogeneous FPGA-based accelerator cards, the servers can be implemented with multiple tasks including efficient video prepossessing, machine learning algorithm acceleration, etc. These servers are required to implement inference during the daytime while re-training the model during the night to adapt to new environments, domains, or new users. During the re-training, conventionally, the incoming data are transmitted to the cloud, and then the updated machine learning models will be transferred back to the edge server. Such a process is inefficient and cannot protect users’ privacy, so it is desirable for the models to be directly trained on the edge servers. Deploying convolutional neural network (CNN) training on heterogeneous resource-constrained FPGAs is challenging since it needs to consider both the complex data dependency of the training process and the communication bottleneck among different FPGAs. Previous multi-accelerator training algorithms select optimal scheduling strategies for data parallelism, tensor parallelism, and pipeline parallelism. However, pipeline parallelism cannot deal with batch normalization (BN) which is an essential CNN operator, while purely applying data parallelism and tensor parallelism suffers from resource under-utilization and intensive communication costs. In this work, we propose MTrain, a novel multi-accelerator training scheduling strategy that transfers the training process into a multi-branch workflow, thus independent sub-operations of different branches are executed on different training accelerators in parallelism for better utilization and reduced communication overhead. Experimental results show that we can achieve efficient CNN training on heterogeneous FPGA-based edge servers with 1.07x-2.21x speedup under 15 GB/s peer-to-peer bandwidth compared to the state-of-the-art work.
more » « less
Full Text Available
CHEF: A Framework for Deploying Heterogeneous Models on Clusters With Heterogeneous FPGAs

https://doi.org/10.1109/TCAD.2024.3438994

Tang, Yue; Song, Yukai; Elango, Naveena; Priya, Sheena Ratnam; Jones, Alex K; Xiong, Jinjun; Zhou, Peipei; Hu, Jingtong (November 2024, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems)

Full Text Available
CHEF: A Framework for Deploying Heterogeneous Models on Clusters with Heterogeneous FPGAs

Tang, Yue; Song, Yukai; Elango, Naveena; Priya, Sheena R; Jones, Alex K; Xiong, Jinjun; Zhou, Peipei; Hu, Jingtong (October 2024, IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS)

Full Text Available
CHEF: A Framework for Deploying Heterogeneous Models on Clusters with Heterogeneous FPGAs

Tang, Yue; Song, Yukai; Elango, Naveena; Priya, Sheena R; Jones, Alex K; Xiong, Jinjun; Zhou, Peipei; Hu, Jingtong (October 2024, IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS)

DNNs are rapidly evolving from streamlined singlemodality single-task (SMST) to multi-modality multi-task (MMMT) with large variations for different layers and complex data dependencies among layers. To support such models, hardware systems also evolved to be heterogeneous. The heterogeneous system comes from the prevailing trend to integrate diverse accelerators into the system for lower latency. FPGAs have high computation density and communication bandwidth and are configurable to be deployed with different designs of accelerators, which are widely used for various machinelearning applications. However, scaling from SMST to MMMT on heterogeneous FPGAs is challenging since MMMT has much larger layer variations, a massive number of layers, and complex data dependency among different backbones. Previous mapping algorithms are either inefficient or over-simplified which makes them impractical in general scenarios. In this work, we propose CHEF to enable efficient implementation of MMMT models in realistic heterogeneous FPGA clusters, i.e. deploying heterogeneous accelerators on heterogeneous FPGAs (A2F) and mapping the heterogeneous DNNs on the deployed heterogeneous accelerators (M2A). We propose CHEF-A2F, a two-stage accelerators-toFPGAs deployment approach to co-optimize hardware deployment and accelerator mapping. In addition, we propose CHEFM2A, which can support general and practical cases compared to previous mapping algorithms. To the best of our knowledge, this is the first attempt to implement MMMT models in real heterogeneous FPGA clusters. Experimental results show that the latency obtained with CHEF is near-optimal while the search time is 10000X less than exhaustively searching the optimal solution.
more » « less
Full Text Available
Unconventional superconducting phase diagram of monolayer ${WTe}_{2}$

https://doi.org/10.1103/PhysRevResearch.7.013224

Song, Tiancheng; Jia, Yanyu; Yu, Guo; Tang, Yue; Uzan, Ayelet J; Zheng, Zhaoyi Joy; Guan, Haosen; Onyszczak, Michael; Singha, Ratnadwip; Gui, Xin; et al (February 2025, Physical Review Research)

The existence of a quantum critical point (QCP) and fluctuations around it are believed to be important for understanding the phase diagram in unconventional superconductors such as cuprates, iron pnictides, and heavy fermion superconductors. However, the QCP is usually buried deep within the superconducting dome and is difficult to investigate. The connection between quantum critical fluctuations and superconductivity remains an outstanding problem in condensed matter. Here combining both electrical transport and Nernst experiments, we explicitly demonstrate the onset of superconductivity at an unconventional QCP in gate-tuned monolayer tungsten ditelluride $({WTe}_{2})$ , with features incompatible with the conventional Bardeen-Cooper-Schrieffer scenario. The results lead to a superconducting phase diagram that is distinguished from other known superconductors. Two distinct gate-tuned quantum phase transitions are observed at the ends of the superconducting dome. We find that quantum fluctuations around the QCP of the underdoped regime are essential for understanding how the monolayer superconductivity is established. The unconventional phase diagram we report here illustrates a previously unknown relation between superconductivity and QCP. Published by the American Physical Society2025
more » « less
Full Text Available
Anomalous superconductivity in twisted MoTe ₂ nanojunctions

https://doi.org/10.1126/sciadv.adq5712

Jia, Yanyu; Song, Tiancheng; Zheng, Zhaoyi Joy; Cheng, Guangming; Uzan, Ayelet J; Yu, Guo; Tang, Yue; Pollak, Connor J; Yuan, Fang; Onyszczak, Michael; et al (January 2025, Science Advances)

Introducing superconductivity in topological materials can lead to innovative electronic phases and device functionalities. Here, we present a unique strategy for quantum engineering of superconducting junctions in moiré materials through direct, on-chip, and fully encapsulated 2D crystal growth. We achieve robust and designable superconductivity in Pd-metalized twisted bilayer molybdenum ditelluride (MoTe₂) and observe anomalous superconducting effects in high-quality junctions across ~20 moiré cells. Unexpectedly, the junction develops enhanced, instead of weakened, superconducting behaviors, exhibiting fluctuations to a higher critical magnetic field compared to its adjacent Pd₇MoTe₂superconductor. In addition, the critical current further exhibits a notable V-shaped minimum at zero magnetic field. These features are unexpected in conventional Josephson junctions and absent in junctions of natural bilayer MoTe₂created using the same approach. We discuss implications of these observations, including the possible formation of mixed even- and odd-parity superconductivity at the moiré junctions. Our results also demonstrate a pathway to engineer and investigate superconductivity in fractional Chern insulators.
more » « less
Full Text Available
Superconductivity from On-Chip Metallization on 2D Topological Chalcogenides

https://doi.org/10.1103/PhysRevX.14.021051

Jia, Yanyu; Yu, Guo; Song, Tiancheng; Yuan, Fang; Uzan, Ayelet J; Tang, Yue; Wang, Pengjie; Singha, Ratnadwip; Onyszczak, Michael; Zheng, Zhaoyi Joy; et al (June 2024, Physical Review X)

Two-dimensional (2D) transition metal dichalcogenides (TMDs) is a versatile class of quantum materials of interest to various fields including, e.g., nanoelectronics, optical devices, and topological and correlated quantum matter. Tailoring the electronic properties of TMDs is essential to their applications in many directions. Here, we report that a highly controllable and uniform on-chip 2D metallization process converts a class of atomically thin TMDs into robust superconductors, a property belonging to none of the starting materials. As examples, we demonstrate the introduction of superconductivity into a class of 2D air-sensitive topological TMDs, including monolayers of $T_{d} − {WTe}_{2}$ , $1 T^{'} − {MoTe}_{2}$ , and $2 H − {MoTe}_{2}$ , as well as their natural and twisted bilayers, metallized with an ultrathin layer of palladium. This class of TMDs is known to exhibit intriguing topological phases ranging from topological insulator, Weyl semimetal to fractional Chern insulator. The unique, high-quality two-dimensional metallization process is based on our recent findings of the long-distance, non-Fickian in-plane mass transport and chemistry in 2D that occur at relatively low temperatures and in devices fully encapsulated with inert insulating layers. Highly compatible with existing nanofabrication techniques for van der Waals stacks, our results offer a route to designing and engineering superconductivity and topological phases in a class of correlated 2D materials. Published by the American Physical Society2024
more » « less
Full Text Available
Enabling Weakly Supervised Temporal Action Localization From On-Device Learning of the Video Stream

https://doi.org/10.1109/TCAD.2022.3197536

Tang, Yue; Wu, Yawen; Zhou, Peipei; Hu, Jingtong (November 2022, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems)

Full Text Available
EF-Train: Enable Efficient On-device CNN Training on FPGA through Data Reshaping for Online Adaptation or Personalization

https://doi.org/10.1145/3505633

Tang, Yue; Zhang, Xinyi; Zhou, Peipei; Hu, Jingtong (September 2022, ACM Transactions on Design Automation of Electronic Systems)

Conventionally, DNN models are trained once in the cloud and deployed in edge devices such as cars, robots, or unmanned aerial vehicles (UAVs) for real-time inference. However, there are many cases that require the models to adapt to new environments, domains, or users. In order to realize such domain adaption or personalization, the models on devices need to be continuously trained on the device. In this work, we design EF-Train, an efficient DNN training accelerator with a unified channel-level parallelism-based convolution kernel that can achieve end-to-end training on resource-limited low-power edge-level FPGAs. It is challenging to implement on-device training on resource-limited FPGAs due to the low efficiency caused by different memory access patterns among forward and backward propagation and weight update. Therefore, we developed a data reshaping approach with intra-tile continuous memory allocation and weight reuse. An analytical model is established to automatically schedule computation and memory resources to achieve high energy efficiency on edge FPGAs. The experimental results show that our design achieves 46.99 GFLOPS and 6.09 GFLOPS/W in terms of throughput and energy efficiency, respectively.
more » « less
Full Text Available

« Prev Next »

Search for: All records